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powerful tool to process interferometric data with missing phase information. This 
paper intends to revisit the application of self-calibration to Optical Long Baseline 
Interferometry (OLBI ). We cast rigorously the OLBI data processing problem into 
the self-calibration framework and demonstrate the efficiency of the method on 
real astronomical OLBI dataset. © 2008 Optical Society of America 
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1. Introduction 

Optical Long Baseline Interferometry (OLBI ) aims to combine light collected by widely separated 
telescopes to access angular resolutions beyond the diffration limit of each individual aperture. 
Long-baseline interferometers measure a discrete set of spatial frequencies of the observed object, 
or Fourier data. Due to instrumental complexity, current interferometers recombine only a few 
telescopes, and even several nights of observation lead to a very limited number of Fourier data; 
moreover, due to the atmospheric turbulence, it is very difficult to get reliable phase information 
from ground based interferometry [1]. Hence OLBI has to deal with severe under-determination 
and missing phase information. 

The classical answer to under-determination is to use a parametric approach, i.e. to search for 
an object entirely described by a small set of parameters (for instance a circular object with a para- 
metric attenuation profile). With a "good model", such an approach allows a reliable and precise 
estimation of astrophysical parameters. A good model should limit as much as possible the num- 
ber of free parameters, while allowing a description of all the object's features, because parametric 
inversion cannot reveal unguessed features. The x 2 fit is often used as a model quality diagnosis, 
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since an inadequate model will often result in a poor fit to the data, thus revealing that a new model 
(with more parameters or different parameters) is needed. However, it does not reveal which new 
model must be adopted. 

As progress in instrumental issues gives access to better frequency coverage, i.e. to potentially 
finer descriptions of the object, the choice of the model becomes more difficult. An alternate and 
complementary approach is then non-parametric reconstruction, which we will call "optical long 
baseline interferometric imaging" (OLBII). Imaging means that the object is described by a large 
set of parameters, such as coefficients of the object's decomposition in some spatial functional 
basis, while under-determination is tackled by regularization tools. Imaging is useful to understand 
the structure of a complex object when prior information is limited. 

From the beginning, OLBII has been influenced by the remarkable techniques developed in 
radio-interferometry with very large baselines (VLB I) [2]. For instance, the "WIPE" OLBII tech- 
nique of A. Lannes et al.[3] is inspired by the well-known CLEAN method [4]. As regards the 
missing phase problem, the self-calibration technique proposed in radio-interferometry by Corn- 
well and Wilkinson [5] underlies recent works in OLBII [6]. 

This paper intends to revisit the application of self-calibration to OLBI . Our contribution is 
three-fold: 

1. we cast rigorously the OLBI data processing problem into the self-calibration framework, 
with consideration of the second-order statistics of the noise; 

2. we propose WISARD (for Weak-phase Interferometric Sample Alternating Reconstruction 
Device), a self-calibration algorithm dedicated to OLBII, which uses the proposed data 
model within a Bayesian regularization approach; 

3. we demonstrate the efficiency of WISARD on real astronomical OLBI dataset. 

The paper is organized as follows: Section 2 describes the observation model of OLBI , briefly 
presents a Bayesian approach and discusses the main problems that are encountered because of the 
incomplete OLBI data. Section 3 is devoted to the derivation of a specific myopic model, which 
achieves a good approximation of the data model and leads to self-calibration techniques. One 
such technique, WlSARD, is proposed in Section 4. Results of WlSARD on simulated and real 
astronomical datasets are presented in Section 5. Our conclusions are given in Section 6. Most 
mathematical derivations are gathered in the Appendices. 

2. Realistic observables in optical long baseline interferometry 

2.A. Ideal interferometric data 

Here we describe the ideal data, i.e. without aberrations, noise or turbulence effects, produced by 
a N t -telescope interferometer observing a monochromatic source with wavelength A. The bright- 
ness distribution of the source is denoted £ being angular coordinates on the sky. Individual 
telescopes Tk of the interferometer are located at three-space positions OTk, and we denote Vk(t) 
the projection of OTk onto V, the plane normal to the pointing direction. Because of the Earth's 
rotation, the pointing direction changes during an observing night, so these projected vectors are 
time dependent. 
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Each pair (T k ,Ti) of telescopes yields a fringe pattern with a 2D spatial frequency u k i(t) = 



, where u M (t) is the baseline 



u kl {t)= ri {t)-r k {t), (1) 



i.e. the projection of the vector T fc TJ onto V. 

Measuring the position and contrast of these fringes yields a phase 0^ ta (t) and an amplitude 
a^ ta (t), which can be grouped together in a complex visibility 

vT{t) = aT{t¥^ (t) . (2) 

According to the Van Cittert-Zernike theorem [7], complex visibilities are ideally linked to the 
normalized Fourier Transform (FT) of x(£) at the 2D spatial frequency v k i{t) through 

data,,, m FT[s(Q] (U kl (t)) 

Vki (t)- Vkl (t) FT[xm{Q) ■ (3) 

The instrumental visibility rjki(t) accounts from the many potential sources of visibility loss: 
residual perturbations of the wavefront at each telescope, differential tilts between telescopes, dif- 
ferential polarization effects, non-zero spectral width, etc. In practice, the instrumental visibility is 
calibrated on a star reputed to be unresolved by the interferometer before the object of interest is 
observed, and is compensated for in the pre-processing of the raw data. Thanks to this calibration 
step, we replace r)ki(t) by 1 in equation (3). 

For the sake of clarity, we consider a complete iVt-telescope array in what follows, i.e. one in 
which all the possible two-telescope baselines can be formed simultaneously, and a non-redundant 
interferometer configuration, where each baseline provides a different spatial frequency. Extension 
to incomplete and redundant settings is straightforward. Thus, at each time t, there are 

*-(?)- 

complex observation equations such as (3). 

Let us briefly introduce the discretized observation model. The sought brightness distribution x 
is represented by the coefficients x of its projection onto some convenient spatial basis (box func- 
tions, sine's, wavelets, prolate spheroidal functions, etc.). The normalized discrete-continuous 
Fourier matrix H(t) maps the chosen discrete spatial representation into the real-valued instanta- 
neous frequency coverage {i/ki(t))}i<k<i<N t , and we further define 



a(x, t) — \H(t)x\ 
4>(x,t)^zrg{H(t)x} 



(5) 



2.B. Effect of atmospheric turbulence on short-exposure measurements 

At optical wavelengths, atmospheric turbulence affects phase measurements through path length 
fluctuations. The statistics of these fluctuations can be described by a time scale parameter, the 
coherence time r , typically around 10 milliseconds, and by a space scale parameter, the Fried 
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parameter r [8]. We assume that the diameter of the elementary apertures is small relative to 
the Fried parameter, or that each telescope is corrected from the effects of turbulence by adaptive 
optics. The remaining turbulent effects on the interferometric measurements can be seen as a de- 
lay line between the two telescopes T k and TJ, which affects short-exposure phase measurements 
through an additive differential piston <pi(t) — (p k (t): 

<P d kf"(t) = <f>ki(x, t) + <Pi(t) - <p k (t) + noise [2tt] (6) 
or, in a matrix formulation: 

data (t) = 0(aj s t) + B<p(t) + noise [2tt] (7) 

where N b x N t operator B, called the baseline operator, is defined in Appendix A. 

Because the differential pistons are zero-mean, one might think that the object phase <j>(x, t) 
could be recovered from (7) by averaging over many realizations of the atmosphere. However, 
for a long baseline relative to the Fried parameter, the optical path difference between apertures 
introduced by turbulence may be very much greater than the observation wavelength and thus lead 
to random pistons much larger than 2n. The 27r-wrapped perturbation that affects the phase (7) 
is then practically uniformly distributed in [0, 2w]. In consequence, averaging the short-exposure 
phases measurements (7) does not improve the signal-to-noise ratio. 

In phase referencing techniques (see [9]), the turbulent pistons are measured in order to subtract 
them in (7). However powerful and promising, these methods require specific hardware and are 
not feasible for all sources. The only other way to obtain exploitable long-exposure data then is to 
form piston-free short-exposure observables before the averaging. 

2.C. Piston-free short-exposure observables 

Piston-free short-exposure phase observables are quantities /(0 data (t)) in which the turbulent term 
Bip{t) cancels out: 

f(cj> d ^(t)) = f(<t>(x,t) + Btp{t)) = f(<t>(x,t)). (8) 

For an interferometric array of 3 telescopes or more, the closure phases [10] are one famous ex- 
ample, in which / is a linear operator performing triple-wise summation of the phases. For any set 
of three telescopes (Tk, 7], T m ) the short-exposure visibility phase data are 

<f>tf a (t) = <j) k i(x,t) + tpi(t) - Mt) + noise [2tt] 

tfTit) = <Pim(x, t) + <p m (t) - <p t (t) + noise [2tt] (9) 
& (*) = 4>mk{x, t) + <p k (t) - <p m (t) + noise [2tt] 

and the turbulent pistons cancel out in the closure phase defined by : 

fi£(t) = <f>if%t) + <pfT(t) + &(t) + noise [2tt] 

= (f) k i{x, t) + (pi m (x, t) + (p mk (x, t) + noise [2tt] (10) 

= (3 Um {x,t) + noise [2vr] . 
We have the following properties : 
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• the set of all three-telescope closure phases that can be formed using a complete array is 
generated by the (N t — l)(N t — 2)/2 closure phases Pf^^t), k < I, i.e. the closure phase 
which include telescope T x (indeed, /3 d ff£ = /? d ^ ta + /3f^ - (3 d ^). In what follows, these 
canonical closure phases are grouped together in a vector f3 data and C denotes the linear 
closure operator such that C0 data = /3 data (see appendix A). 

• if / is a continuous differentiable function verifying property (8), then 

M)=g(C<l>), (11) 

where g is some continuous differentiable function. In other terms, there is essentially no 
operator other than the closure operator that cancels out the effect of turbulence on short- 
exposure visibility phases (this property holds only in the monochromatic case). 

The proof of the second property is given in appendix B. 
2.D. Long-exposure observables data model 

To minimize the effect of noise, one is led to average short-exposure measurements, into long- 
exposure observables, chosen so that they are asymptotically unbiased. The averaging time must 
be short enough w. r. t. the earth rotation so that the baseline does not change, and long enough to 
reach an acceptable Signal-to-Noise Ratio (SNR). The averaged quantities are generally: 

• averaged squared amplitudes s data, (t) = (a dajta (t + t) 2 ) t , 

• averaged bispectra V d f?(t) = (y d f a (t + r) • y d f*(t + r) ■ y data (t + r)) T , k < I. 

Squared amplitudes are preferred to amplitudes because their bias can be estimated and subtracted 
from the data. Short-exposure bispectra are continuous differentiable functions verifying prop- 
erty (8), and so correspond to a particular choice of g in (1 1). In absence of noise, the averaged 
bispectrum amplitudes are redundant with the averaged squared amplitudes. Although they should 
be useful in low SNR conditions, averaged bispectrum amplitudes are not considered in what fol- 
lows. The averaged bispectrum phases (3 d ^ a (t), k < I constitute unbiased long-exposure closure 
phase estimators. As such, they are linked to the object phases (j>(x, t) through: 

/3 data (t) = C(f>(x, t) + noise [2tt] (12) 

It is shown in appendix A that the kernel of the closure operator C is of dimension (N t — 1). Hence 
equation (12) implies that optical interferometry through turbulence has to deal with a partial phase 
information. This result can also be obtained by counting up phase unknowns for each instant of 
measurement t: there are N t (N t — l)/2 unknown object visibility phases and (N t — l)(N t — 2)/2 
observable independent closure phases, which results in (N t — 1) missing phase data. As well 
known in the radio-interferometric community, the more apertures in the array, the smaller the 
proportion of missing phase information will be. 

The long-exposure observables considered in this paper are noisy squared amplitudes s data (t) 
and closure phases /3 data (t). The only statistics usually available are the variances for each observ- 
able (as, for instance, in the OIFITS data exchange format [11]). The assumed noise distribution is 
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consequently 0-mean white Gaussian: 




= a 2 (x,t) + s™ ise (t), s aoiBe (t)~Af(0,R sit) ) 
= Ccj>(x, t) + f3 noisc (t) [2k], /3 noisc (t) ~ M (0, R m ) 

The matrices R s (t) and Rptt) are diagonal, with variances related to the integration time, although 
correlations may be produced by the use of the same reference stars in the calibration process 12 . 

2.E. Bayesian reconstruction methods 

This approach first forms the anti-log-likelihood according to the model (13) 

Jd ata (£c) = J- J***( X ,t) = J2^)^)+X 2 m (x) (14) 

t t 

where xl^i 00 ) denotes the \ 2 statistic (s data (i) — a 2 (x, t)) T R~m (s data (t) — a 2 (x,t)). Closure 
terms X^^x) are a weighted quadratic distance between complex phasors 13 instead of a Chi- 
2 statistic over closure phase residuals. One then associates J data with a regularization term to 
account for the incompleteness of the data in such inverse problems and minimizes the composite 
criterion 

J(x) = J data (x) + J prioi (x) (15) 

under the following constraints: 

V(p,g), x(p,q) > 

^)x(p,g) = l. ( 16 ^ 

p,q 

The first requires positivity of the sought object, the second is a constraint of unit flux. Indeed, 
fringe visibilities are by definition flux-normalized quantities (i.e. normalized by the Fourier trans- 
form of the object at the null frequency, see Eq. 3), so the data are independent of the total flux of 
the sought object (of course an interferometer is sensitive to the total flux of the source, but this 
last value is not contained in the fringe visibility itself). 

The regularization term JP rior is chosen to enforce some properties of the object which are known 
a priori (smoothness, spiky behavior, positivity, etc.) and should also ease the minimization. Simple 
and popular regularization terms are convex separable penalizations of the object pixels (i.e. white 
priors) or of the object spatial derivatives (for instance first-order derivative or gradient). In what 
follows, we quickly describe the prior terms used in this paper. These priors are more extensively 
described and compared in [14]. For a general review on regularization, see [15]. 

Entropic priors belong to the family of white priors and often allow to obtain a clean image while 
preserving its sharp spiky features, whereas quadratic penalization tends to soften the reconstructed 
map. The white quadratic-linear (or L 2 L™) penalization given by: 

L& ( .) = ez^-*( 1 + T ¥) <17) 

p,q ^ ' 
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that we use in section 5 leads to a kind of entropic regularization, in the sense of [16]. We propose 
a nominal setting of the two parameters 5 and s: 

s = l/N pix ; 6=1. (18) 

As regards regularization based on the object's spatial derivatives, we shall consider here only 
quadratic penalization, but convex quadratic-linear L 2 L\ penalization functions could also be in- 
voked. 

Ref. [17] is one of the works that adopt such a Bayesian approach for processing optical long 
baseline interferometry, using a constrained local descent method to minimize (15). A convex data 
criterion J, i.e. such that J(k ■ x\ + (1 — k) -x 2 ) < k ■ J(x\) + (1 — k) ■ J(x 2 ), Vxi,x 2 , Vfc G [0, 1], 
has no local minima, which makes the minimization much easier. Unfortunately, the criterion J is 
non-convex. To be more precise, the difficulty of the problem can be summed up as follows: 

(i) The small number of Fourier coefficients makes the problem under-determined. Here the 
regularization term and the positivity constraint can help by limiting the high frequencies of 
the reconstructed object [6]. 

(ii) Closure phase measurements implies missing phase information and makes the Fourier syn- 
thesis problem non-convex. Adding a regularization term does not generally correct the prob- 
lem [18]. 

(iii) Phase and modulus measurements with additive Gaussian noise leads to a non-Gaussian 
likelihood and a non-convex log-likelihood w.r.t. a?. As a consequence, even with no missing 
phases, some approximation of the real observable statistics is necessary to get a convex data 
fidelity term. This data conversion from polar to Cartesian coordinates, which is commonly 
used in the field of radar processing [19], has been studied only recently in OLBI [20]: see 
section 3.C. 

These characteristics imply that optimizing J by a local descent algorithm can only work if the 
initialization selects the "right" valley of the criterion. The design of a good initial position is very 
case-dependent, and will not be extensively addressed here. The other key aspects are then the 
followed path, i.e. the minimization method, and the shape of the function to minimize, i.e. the 
behavior of the criterion x i— ► J(x). This paper addresses both aspects: 

• we design a specific OLBI criterion J{x, a) where two sets of variables appear explicitly, 
one in the spatial domain x, describing the sought object, and another in the Fourier phase 
domain a, which accounts for the missing phase information. This specific criterion is de- 
signed to solve (iii), i.e. so that for a known ex, the criterion is convex w. r. t. x. In other 
words, if we had all the complex visibility phase measurements instead of just the closure 
phases, our criterion x i— > J(x, a) would be convex; 

• we adopt an alternate minimization method, working on the two sets of variables. 

This approach can be related to "myopic" approaches of some inverse problems, where missing 
data concerning the instrumental response are modeled and sought for during the inversion [21]. 
Alternate minimization methods are inspired by self-calibration methods in radio-interferometry, 
and have been used in optical interferometry by Lannes et al. [6]. However, the criterion used in 
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Ref. [6] was essentially imported from radio-interferometry and does not match the OLBI data 
model (13). Our main contribution is to derive a criterion which accounts for the data model (13), 
while allowing an efficient alternate minimization. This construction is the subject of the next 
section. 

3. An equivalent myopic model for self-calibration 

The aim of this section is to approximate the data model of equation (13): 

/3 data (t) = Ccj>(x,t) + [3™ isc (t) [2tt], (3™ isc (t)~Af(o,R m ) (20) 
by a myopic linear model with additive complex Gaussian noise of the following form: 

y data (t) = T a{t) ■ H(t)x + y noisc (t) (21) 

where operator • denotes componentwise multiplication, and Fait) is a vector of phasors depending 
on phase aberration parameters a(t), which are defined in Sec. 3.B. This will be done in three steps: 

• Sec. 3. A is devoted to the derivation of the observation model for the pseudo amplitude term 
a data (t) from (19); 

• Sec. 3.B is devoted to the derivation of the observation model for the pseudo phase term 
data (t) from (20); 

• Sec. 3.C shows how to combine pseudo phase and pseudo amplitude models in a complex 
model such as equation (21) while solving problem (iii) of Sec. 2.E. 

3.A. Pseudo amplitude data model 

In Eq. (19), we have supposed a Gaussian distribution for s data (t) around s(x, t), which is ques- 
tionable, since squared amplitudes should be non-negative. However, such a statistic model is 
acceptable provided that the probability of a negative component of s data (t) is very weak. For 
uncorrected measurements, this assumption correspond to mean values much greater than the cor- 
responding standard deviation. Appendix D page 19 shows how to build the mean and co variance 
matrix of the square root of such a distribution. The mean vector is taken as the pseudo amplitude 
data a data (i), and the covariance matrix called R a (t)- 

The observation model (19) can then be approximated by the following amplitude pseudo data 
model: 

a data (t) = a (x, t) + a noisc (t), a noise (t) ~ AT (0, R a(t) ) . (22) 
3.B. Pseudo phase data model 

We start from a generalized inverse solution to the phase closure equation of (20). The generalized 

inverse C f of C, defined by C f = C T [CC T ] ~\ is such that CC ] = Id. By applying it on all 
the terms of (20), we obtain 

C f /3 data (t) = C ] Ccf){x, t) + C f /3 noisc (t) + 2tt C ] k (23) 



8 



where n is a vector of integers to account for the fact that each phase component is measured 
modulo 2n. We define 

data (t) 4 c f /3 data (t) (24) 
kcr (t) 4 (&C-Id)tf>(x,t) + 2nC ] Ki (25) 

and obtain 

data (i) = (f>( Xi t ) + kcr (t) + C^(3 noisc (t) (26) 
Vector ker (t) belongs to the 27r-wrapped kernel of operator C : 

C<t> ker {t) = (QC^C - C)<f>(x,t) + 2ttQC^k 

=Id =Id 

= 2ktt 
= [2tt] 

As shown in appendix C, if kcr = [2ir], there exists a real vector a(t) of dimension N t — 1 
such that kcr (t) = Ba(t) [2tt], where i? is obtained by removing the first column of operator B. 
So we have: 

data (t) = (f)(x, t) + Ba(t) + C^f3 noisc {t) [2tt] (27) 
Now the problem is that C'/3 noise (t) is a zero mean random vector with a singular covariance 



matrix 



R %(t) — C t i2 /3( i)C tT . 



To obtain a strictly convex log-likelihood, we have to approximate this term by a proper Gaussian 
vector 4> noisc {t), with an invertible covariance matrix Rm) chosen so as to correctly fit the second 
order statistics of the noise in the phase closure measurement equation (20). This last requirement 
can be written as the following equation: 

CR^C^ 1 = R/3(t)- (28) 

In other words, we are led to choose an invertible covariance matrix R^t) so as to mimic the 
statistical behavior of the closures, which is expressed by (28). 

We propose to modify matrix R°M t ) by setting its non diagonal components to 0, i.e. to use the 
following diagonal matrix: 

{^)},= n 1 ^ ...J, (29) 
[0 if j 

The factor 3 allows us to preserve the total weight of the phase term in the log-likelihood by 
satisfying the condition: 

Yl \{ R ^)}ij\ = J2\i R l(t)}ij 

There are several ways of choosing R^rt), and we propose this particular choice without claiming 
it is optimal. Note that the myopic model derived in what follows can accomodate to any choice of 
a proper (i.e. invertible) covariance matrix R^ t y 
With equations (24), (27) and (29), we obtain the visibility phase pseudo data model: 

data (t) = <f>(x, t) + Ba(t) + <f> noisc (t) [2tt], </> noisc (t) ~ N (0, R m ) . (30) 
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3.C. Pseudo complex visibility data model 

Gathering equations (22) and (30), we have finally approximated the data model (19-20) by 

a data (t) = a{x, t) + a noise (t), a noise (t) ~ J\f (0, R a(t) ) . 

<f> d ^(t) = cj>(x,t) + B a (t) + <f>™ isc (t)[27t}, noisc (t)~Af(O,i^ (t) ). ( " } 

We form pseudo-complex visibility measurements y data (t) defined by: 

^(^a^fil-^O (32) 

The approach proposed in [20], which we recall and generalize in Appendix E, is based on an 
approximated complex visibility data model 

y ****(t) = H(t)x ■ e lSa ® + y noise (t) (33) 

This is exactly the sought model stated the beginning of the present section in equation (21), with 
J~a(t) — e lBa ^'. We now define the myopic observation model as follows: 

y m (x,a(t)) = H{t)x-e iBa ^. (34) 

As shown in Appendix E, the mean value y nomc (t) and co variance matrix i? y noi SC ( t ) of the addi- 
tive complex noise term y nmse (t) are carefully designed so that the corresponding data likelihood 
criterion is convex quadratic w. r. t. the complex y m (x, a(t)) while remaining close to the real 
non convex model. To illustrate these properties, we consider one complex visibility and plot in 
the complex plane the distribution of 7/ data (t) around y m (x, at(t)) for the true noise distribution 
— i.e. a polar Gaussian noise in phase and modulus — and our cartesian Gaussian approximation 
(see Fig. 1) In particular, the "elliptic" covariance matrix we propose (which yields elliptic contour 
plots in Fig. 1), is preferable to the more classical "circular" approximation that appears in previ- 
ous contributions on OLBI [22]. The latter can be described by half as many parameters as needed 
for the elliptic one (one radius for a circle, instead of a short axis and a long axis for an ellipsis), 
but is clearly less accurate [20] (such a noise statistics description has also been investigated for 
the complex bi spectra in the OIFITS data exchange format [11]). 

From Eq. (33), we build a Chi-2 statistics over real and imaginary parts of the observation 
equation 

3m{y data ({) -y m (x,a(t)) - y noisc (t)} 

%m{y data (t)-y m (x,a(t))-y noise (t)} ' 
And we finally propose the myopic goodness-of-fit criterion: 

t t 
We can now design a myopic Bayesian approach to the reconstruction problem, by combining the 
data term with a regularization term along the lines of Section 2.E: 

J{x, a) = J***(x, a) + J^' mr {x). (36) 

The next section describes an alternate minimization technique applied to the regularized crite- 
rion (36). 



X 2 y{t )(x,a(t)) 



10 




o 



Fig. 1. Contour plots of a polar Gaussian distribution and of its Cartesian Gaussian 
approximation 

4. WlSARD 

In this section, we describe WlSARD, standing for Weak-phase Interferometric Sample Alternating 
Reconstruction Device, a self-calibration method for OLBII. 

4.A. Global structure of WlSARD 
WlSARD is made of four major blocks: 

• a first block recasts the raw data (i.e. closure phases and squared visibilities) in myopic data 
(i.e. phases and moduli) as described in sections 3. A and 3.B; 

• a second "convexification block" computes a Gaussian approximation of the pseudo visibil- 
ity data model as described in section 3.C; 

• a third block builds a guess for the object x and aberrations a (i.e. a good starting point); 

• finally, the self-calibration block performs the minimization of the regularized criterion (36), 
under the constraints (16). It alternates optimization of the object for given aberrations, and 
optimization of the aberrations for the current object. 

The structure of WlSARD is sketched in Fig. 2. The principles which underline the three first 
blocks of WlSARD have been described in previous Sections, while details on the self-calibration 
minimization are gathered in the next one. 
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a data ^data 
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Myopic approx. data r y data R 



Initialization : 

guess x a 



j. 



Self-calibration 



Aberration step 



Object step 



Reconstruction 



Fig. 2. WlSARD algorithm loop 



4.B. Self-calibration block 

Minimization w. r. t. x The criterion l 7 data (cc, a) we have derived is quadratic hence convex 
w.r.t. the object x. Hence, the minimization versus x does not raise special difficulties. 

Minimization w. r. t. a J dat& (x, ex) is the sum of terms involving only measurements obtained 
at one time instant t (equation 35): 



J 



data / 



x, a) 



J daU (x,ct(t),t) 



t 



Because the time between two measurements is much greater than the turbulence coherent time 
(around 10 ms), aberrations at(t) at two different instants are statistically independent. We can 
then solve separately for each set of ct(t), which dramatically reduces the complexity of the mini- 
mization. The number of a(t) components to solve for is (N t — 1) and the minimization is delicate, 
as the criterion exhibits periodic structures which have been studied in [22]. 

However, exact minimization is affordable for a 3-telescope interferometric array. In this case 
we have to perform several 2-parameter minimizations, and each one can be efficiently initialized 
by an exhaustive search on a 2-D grid, which ensures we avoid local minima. On the other hand, 
when N t gets high enough, e.g. 6, then number of ct(t) to solve for, e.g. 5, gets small compared 
to the number of closure phases available, e.g. 15. With a 3-telescope array, 2/3 of the phase infor- 
mation is missing, whereas with a 6-telescope array, only 1/3 of the phase information is missing. 
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In this last case, which corresponds to the processing of synthetic data presented Sec. 5. A, the 
reconstructions were straightforward, and no effects of the local minima in a. were witnessed. 

In other words, coping with the ambiguities in a, for instance with the specific criterion proposed 
in [22], may be necessary only for N t — 4 or Nt — 5. For N t = 3, an exhaustive search is possible, 
and for N t > 6, ambiguities in a do not have, according to our experience, a major impact on 
reconstruction. 

Starting point : object and aberration guess x and ck If a parametric model of the observed 
stellar source is not available, the object starting point is a mean square solution, from which we 
extract the positive part. The first step in the self-calibration block is a minimization w. r. t. a for 

X = Xq. 

5. Results 

This section presents some results of processing by the WlSARD algorithm, with both synthetic 
and experimental data. 

5. A. Processing of synthetic data 

The first example takes synthetic interferometric data that were used in the international Imaging 
Beauty Contest organized by P. Lawson for the IAU [23]. These data simulate the observation of 
the synthetic object shown in figure 3 with the NPOI [24] 6-telescope interferometer. The corre- 
sponding frequency coverage, shown in figure 3, has a structure in arcs of circles typical of the 
super-synthesis technique, which consists in repeating the measurements over several nights of 
observation so that the same baselines access different measurement spatial frequencies because 
of the Earth's rotation. In total, there are 195 square visibility modules and 130 closure phases, 
together with the associated variances. 

Six reconstructions obtained with WlSARD are shown in figure 4. On the upper row is a re- 
construction using a quadratic regularization based on a power spectral density model in 1/|«| 3 , 
for a weak, a strong and a correct regularization parameter. The latter gives a satisfactory level 
of smoothing but does not restore the peak in the center of the object. The peak is visible in the 
under-regularized reconstruction on the left but at the cost of too high a residual variance. 

The reconstruction presented on the lower row is a good trade-off between smoothing and 
restoration of the central peak thanks to the use of the white L 2 L™ prior term introduced in sec- 
tion 2.E. The automatically set parameters (eq. 18) are very satisfactory (left), and a light tuning 
(center and right) allow an even better reconstruction. The goodness of fit of the L 2 L™ reconstruc- 
tion can be appreciated in figure 5. The red crosses show the reconstructed visibility moduli (i.e. 
of the FT of the reconstructed object at the measurement frequencies) and the blue squares are 
the moduli of the measured visibilities. The difference between the two, weighted by 10 times the 
standard deviation of the moduli, is shown as the dotted line. The mean value of this difference is 
0.1, which shows a good fit (to within 1 a). 

5.B. Processing of experimental data 

Here, we present the reconstruction the star x Cygni from experimental data using the WlSARD 
algorithm. The data were obtained by S. Lacour and S. Meimon under the leadership of G. Perrin 
during a measuring campaign on the IOTA interferometer [25] in May 2005. As already mentioned, 
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Fig. 3. Synthetic object (right) and frequency coverage (left) from the Imaging 
Beauty Contest 2004 



each measurement has to be calibrated by observation of an object that acts as a point source at the 
instrument's resolving power. The calibrators chosen were HD 180450 and HD 176670. 

X Cygni is a Mira-type star, Mira itself being an example of such stars. Perrin et al. [26] propose 
a model of Mira-type stars, composed of a photosphere, an empty layer, and a thin molecular layer. 
The aim of the mission was to obtain images of % Cygni in the H band (1.65 microns ±175nm) 
and, in particular, to highlight possible assymmetric features in the structure of the molecular layer. 

Figure 6 shows, on the left, the u — v coverage obtained, i.e. the set of spatial frequencies 
measured, multiplied by the observation wavelength. Because the sky is habitually represented 
with the west on the right, the coordinates used are, in fact, —u, v. The domain of the accessible 
u — v plane is constrained by the geometry of the interferometer and the position of the star in the 
sky. The "hour-glass" shape is characteristic of the IOTA interferometer, and entails non-uniform 
resolution that affects the image reconstruction, shown on the right. The reconstructed angular field 
has sides of 60 milliarcseconds. In addition to the positivity constraint, the regularization term 
used is the L 2 Lf term described in section 2.E. The interested reader will find an astrophysical 
interpretation of this result in [27]. 

6. Concluding comments 

We have proposed a complete and precise self-calibration approach to optical interferometry im- 
age reconstruction. After pointing out the data model specificities in the optical long baseline 
interferometry context, we have emphasized the sources of under-determinations, which make a 
classical Bayesian criterion descent method critical. Namely, the main problems are the phase 
under-determination caused by turbulence effects, and, as noted only recently, the polar coordinate 
structure of the data model. 

We have built a specially-designed approximate myopic data-model, in order to derive a self 



14 

A 



Fig. 4. Reconstructions with WlSARD. Upper row : under-regularized quadratic 
model (left), over-regularized quadratic model (center), quadratic model with cor- 
rect regularization parameter (right). Lower row : white L 2 L™ model with auto- 
matically set scale and delta parameters (left), white L 2 L± model with half scale 
(center), white L 2 Lf model with half delta (right). Each image field is 12.1 x 12.1 
mas. 
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Fig. 5. Goodness of fit at WlSARD convergence. 
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Fig. 6. Frequency coverage (left) and reconstruction of the star x Cygni (right). 
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calibration method. Special care was given to the design of the second order statistics of the myopic 
model, an aspect which was ignored in previous related works. 

We have extended our previous work on polar data conversion 20 and proposed a convex approx- 
imation of the noise model which reduces the number of local minima of the criterion to minimize. 

We also addressed integer ambiguities induced by closure phase wrapping, which are classical 
when dealing with phase data, and have discussed their impact on the image reconstruction quality 
: for 3 telescope data, we have proposed an exhaustive search method, and we have witnessed 
that these ambiguities do not raise any particular problem when processing 6 or more telescope 
interferometer data. Concerning the remaining 4-5 telescope case, the work by Lannes [22] should 
be worth investigating. On the other hand, global minimization methods were left aside because of 
their intensive computation needs. As computer performance increases, these methods might be, 
in the years to come, an appropriate way to deal with local minima. 

All these developments allowed us to propose WlSARD, a self-calibration method for optical 
long baseline interferometry image reconstruction, and to demonstrate its efficiency on simulated 
data. 

Finally WlSARD was also used to successfully process real astronomical OLBI datatsets. These 
results were made possible thanks to a close partnership with the astronomers Sylvestre Lacour and 
Guy Perrin of the Observatoire de Paris Meudon, whithin the PHASE group (ONERA/LESIA). 
Indeed, an accurate astronomical model of the observed stellar object is a precious guideline for 
reconstructing a complex image from optical long baseline interferometric data. To the authors 
point of view, such a collaboration is essential to the success of OLBII techniques. 

A. The baseline and closure operators C and B 

Let N t be the number of telescopes of the interferometric array. We have the following definitions: 

(37) 



£ 2 =[-l 1] 





A 


— ljV t -l 


Idiv t -i 






O 


B N t -l _ 






A 


— -B/Vt-i 


Id(iV t -l)(JV t -2) 
2 



(38) 
(39) 



for N t > 3. 

In what follows, we prove that kerC = im B. 
We have C Nt B Nt = 0, so 

imBc ker C 



(40) 



It is straightforward to prove by recurrence that B Ni ■ l Nt — 0, which yields rank B Nt < N t — 1. 
Because B Nt contains Idjv t -i we gather: 



dim imB = rank B — N t — 1 . 
C/v t contains Id (jv t -i)(jv t -2) , which yields rankCjv t > ( Nt ~ 1 )( Nt - 2 ) 

2 

dimkerC^ < N t - 1 

With (40), (41) and 42), we gather: 

ker C = im B 



, or 



(41) 

(42) 
(43) 
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B. Characterization of the baseline phase independent operators 

Here, we prove that any continuous differentiable function / verifying property 8 

f(<!> + B<p) = f(<l>), V(0 )V >) 

is such that /(</>) = g(C<f>). C has more columns than rows, so its pseudo-inverse is defined by 
C ] = C T [CC T ] ~ X and verifies 

CC ] = Id (44) 

and thus 

CC ] C -C = 0^C (C ] C4> - (f>) = 0, V0 

( # } 3cp, {C ] C4> - (f>) = Bip, 
3tp, <f> = C ] C(f) - Bp, V0 

With this we obtain that any / verifying 8 is such that 

/(</>) = f(&C4> - Bip) = f(C*C4>) = g{Cct>). 

C. Wrapped kernel of operator C 

The kernel of operator C is given by ker C = m\B (equation 43). With dimensional arguments, 
it is easy to see that 

imB = imB 

where B is obtained by removing the first column of operator B, so we have 

kerC = imB (45) 
Let us now characterize the set of </> kcr such that : 

C0 kcr = [2tt] 

Because C has integer components, </> ker can be considered modulo 2tt. With equation 45, we 
obtain: 

3ai, kcr = C f (0 [2tt]) + Bar [2tt] (46) 

Because B has integer components, cti can be considered modulo 2n. The issue here is to evaluate 
the (0 [27r]) term, i.e. the value of (2tvk), with n any integer vector. 

Equations 37 show that C = [ M | Id ] . The integer vector fi = 

Cn = [ * | Id ] 







is then such that 







K,. 
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Then we have: 

C/X = k, =^cy = cc f K 

^C(C f K-/x) = 

=^3a 2 , C^k — n = Bol 2 

=^3a 2 , 27rC f K = 27r/x + B(27ra 2 ) 

^3a 2 , C f (0 [2tt]) + = B( 2ira 2 + a£ [2tt]. 

So equation 46 yields 

3a, kcr = Ba [2tt] (47) 
D. Square-root of a Gaussian distribution 

Let us suppose we measure the squared value s of a positive value a, with an additive Gaussian 
noise: 

s data = a 2 + ^noise^ (4g) 

s noiso |, em g mean Gaussian with the variance a 2 . Let a be the estimator of a from s data defined 
by 

^ ^ J v / s data > jf s data > q 

a ~ { else 

a can be seen as pseudo-data. The data model of a derived from equation 48 is not additive Gaus- 
sian. As will be shown in section E, an optimal Gaussian approximation of the data model of a 
would be: 

a = a + a noise , (49) 

with a noisc a Gaussian noise with a mean equal to < a > and a standard deviation ^Vax(a). 

We have studied the behavior of the mean < a > and standard deviation y / Vax(d) of this 
estimator for various values of a 2 , with a unit a s (see figs. 7 and 8). 

We can distinguish two regimes for < a >: 

• a low mean regime, where a 2 < a s /6 : a non negligible part of the distribution of s data around 
a 2 is in the negative domain. Because a estimates a null value for a when s data is negative, 
its mean will mainly depend on the width of the Gaussian wings. A good approximation of 

< a > is a/o" s /6; 

• a high mean regime, where a 2 > a s /6 : the most part of the distribution of s data around 
a 2 is in the positive domain. The fact that a estimates a null value for a when s data < 
does not impact its mean < a >, which is close to a. Because a is not known, we choose 

< a >= V s data . 

We can distinguish the same two regimes for ^/Var(a). However, the transition is around a s : 

• when a 2 < a s , the fact that a estimates a null value for a when s data is negative tends to 
diminish its standard deviation, which we approximate by y^Yar(a) ~ ^Jo~ s /2; 
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Fig. 7. Behavior of < a > in function of a 2 with a unit a s 
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Fig. 8. Behavior of y^Yax(a) in function of a 2 with a unit a s 



in the high mean regime, where a 2 > a s , the most part of the distribution of s data around a 2 
is in the positive domain, and y^Vai(a) is close to the classical expression. This expression 
corresponds to a first order expansion in a a : 

(a + a a ) 2 = a 2 + cr s ^ 2aa a ~ a s . 



a s /2a. Because a is not known, we choose A/Var(a) = a s /2\/ s data . 
We then propose the pseudo-data model 



data i noise 

Ct — CL I C6 



with a 



data 



Vs data , if s data > 
else 



and a Ilolsc a Gaussian noise with mean and standard devi- 
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ation defined by with : 



ct s /6, if s data < cr s /6 
\/s data , if s data > a s /6 

Jo~ a /2 if s data < ct s 



if 8 



data 



2 V / S data ' 

We also decide to discard the data such that s data < —a,. 



E. Cartesian Gaussian approximation to a polar Gaussian distribution 

If we define 

y a{t) (x,t)^H(t)x.e lB ^\ 

equation (31) reads 

a d ^(t) = \y a(t) \(x,t) + a™ e (t), 

data (t)iarg 2/a(t) ( a; ,O + ^ noise W, 
E.A. General expression 

With consider a polar distribution of a Gaussian vector y of modulus a and phase (ft: 

data = + </> noisc 



a noisc (t)~AT(0,i? aW ). 

noise (O~^(o,^ (t) ). 



(50) 



(51) 



a 



data 



a + a 



(52) 
(53) 



where noisc and a noise are mean real Gaussian vectors, of covariance matrices R a and R<p (the 
vectors noise and a llolse are supposed uncorrelated). 
With the definitions 



y = a exp icp 



y 1 



A ^data 



yrad = ^e^ir~~ e 



y 

noise —i</> 



(54) 



.n 



■noise A 



noise^— i</> 



2/rad 



2/tan 



we gather: 



^ ad = [a + a noisc ] COS0 11 
= [a + a noise ] sin0 n 



a 



(55) 



A complex vector is Gaussian if and only if each of its components is Gaussian. A complex is 
Gaussian if and only if, in any Cartesian basis, its two components are Gaussian. So y is Gaussian 
if and only if y nmsc is Gaussian, which is not the case[20]. In what follows, we show how to 
optimally approximate the distribution of y nmsc by a Gaussian distribution. 
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E.B. Gaussian Approximation 

We characterize our Cartesian additive Gaussian approximation, i.e. its mean (y noise ) and covari- 
ance R= n0 i SC , by minimizing the Kullback-Leibler distance between the two noise distributions, 
which gives [20] : 



= noise ^ = E 

E 



^rad 



y 



tan 



2/rad 
2/tan 



VxaA ^rad 



2^tan 



2^tan 



VxaA ^rad 



2/ tan 



2^tan 



(56) 



and we define 



-f^l^noisc 



R 



rad,rad -Rrad,tan 



rad.tan Rtan,tan 



For a mean Gaussian vector noise f covariance matrix R^, 



E{sm<j)™ isc } = 
£{cosC is °} =exp 



R. 



■4> 



E {sin 0™ isc sin 0™ ise } = sinh R, 

{cos^'cos^ 86 } = cos h 
£{cos0™ ise sin0™ ise 
By combining equations. 56, 



* y -exp — 



R <t>ij ■ ex P g 

}=0 

54, 55 and 57, we obtain: 



[Rrad,rad 



e 2—1 



(cosh-R^. - lj + Ra l3 coshR^. 



R <t>u +R <i>ij 



\R- 



rad,tan 



\R 



tan, tan 







(cuaj + R aij ) sinhR^.. • e 



<r n "33 



(57) 



(58) 



E. C. The scalar case 

Now, we make the additional assumption that both noise and a noise are decorrelated, i.e. 

R a = Diag{cr2J 
Rj, = Diag {o-JJ 
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We obtain: 



#rad,rad = Diag {o^J 
#tan,tan = Diag {(?tan,i} 
-Rrad,tan 



with 



2 2 V 2 ' (59) 

<,= |(l-e-*)+^(l-eX, 

In this case, we can plot for one complex visibility the true noise distribution - i.e. a Gaussian 
noise in phase and modulus - and our Gaussian approximation (see fig. 1). 
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